Back

Human Genomics

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Human Genomics's content profile, based on 13 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
PATHOS: Predicting Variant Pathogenicity by Combining Protein Language Models and Biological Features

Radjasandirane, R.; Cretin, G.; Diharce, J.; de Brevern, A. G.; Gelly, J.-C.

2025-12-27 genetic and genomic medicine 10.64898/2025.12.22.25342839
Top 0.1%
82× avg
Show abstract

Predicting the pathogenic impact of missense variants is essential for understanding and diagnosing genetic diseases. These approaches have undergone significant evolution, with the latest methodologies based on deep learning approaches. Nonetheless, only a limited number use the potential of Protein Language Models (PLMs), which have demonstrated strong performance across various protein-related tasks. A new predictor, called PATHOS, was developed; it combines embeddings from an optimal set of two PLMs, namely ESM C 600M and Ankh 2 Large. Their embeddings were combined with additional crucial biological features such as phylogenetic probabilities, allele frequency, and protein annotations; they were aggregated using a fully connected layer architecture. Compared to 65 other predictors on clinical data, PATHOS outperforms state-of-the-art performance. It achieves a Matthews Correlation Coefficient (MCC) of 0.591 on a manually and carefully curated clinical dataset and 0.826 on a ClinVar dataset, surpassing other leading tools. Furthermore, case studies on the progesterone receptor and the KCNQ1 ion channel illustrate that PATHOS can identify functionally critical regions and known pathogenic mutations missed by other leading predictors like AlphaMissense. To ensure broad accessibility and facilitate use by non-specialists, a user-friendly web server containing a database of 140 millions precomputed predictions from human protein from Swiss-Prot was provided. The web server is available at: https://dsimb.inserm.fr/PATHOS/

2
Methicillin-Susceptible Staphylococcus aureus ST398 in atopic dermatitis in Portugal displays pathogenic traits associated with impaired skin barrier function

Caieiro, D.; Faria, N. A.; Botelho, A.; Araujo, M.; Ramos, L.; Calvao, J.; Goncalo, M.; Miragaia, M.

2026-02-18 dermatology 10.64898/2026.02.17.26346495
Top 0.1%
52× avg
Show abstract

Staphylococcus aureus plays a central role in the exacerbation of atopic dermatitis (AD), but the population structure and pathogenic determinants of strains colonizing AD patients remain poorly understood. It is unclear whether these strains mirror those circulating in the general community or whether specific clonal lineages are selectively adapted to the AD skin microenvironment. Data addressing this question are scarce, particularly in Portugal. In this study, we investigated the molecular epidemiology and pathogenic traits of S. aureus colonizing skin lesions in adult patients with AD in Portugal. We found that lesion-associated isolates belonged predominantly to the methicillin-susceptible S. aureus MSSA-ST398 clonal type, a lineage that is widely circulating in the Portuguese community, particularly among vulnerable populations, and that has also been implicated in severe human infections. Notably, isolates from this clonal type in AD harboured specific pathogenicity traits associated with skin barrier disruption, including hemolysin and urease production, which may contribute to their success as colonizers in AD. Our findings highlight that S. aureus colonization in AD arises from a dynamic interplay between community-level molecular epidemiology and disease-specific selective pressures. While circulating lineages provide the genetic background diversity, the AD skin microenvironment appears to shape which clones ultimately become dominant. Such an integrated perspective may help to inform future geographically tailored strategies aimed at limiting bacterial burden and preventing disease exacerbation in AD.

3
The landscape of structural variants in male infertility identified by optical genome mapping

Kovanda, A.; Hodzic, A.; Kotnik, U.; Visnjar, T.; Podgrajsek, R.; Andjelic, A.; Jaklic, H.; Maver, A.; Lovrecic, L.; Peterlin, B.

2026-03-02 genetic and genomic medicine 10.64898/2026.02.27.26347236
Top 0.2%
49× avg
Show abstract

STUDY QUESTION[Do structural genomic variants, that can be identified by using optical genome mapping, contribute to male infertility?] SUMMARY ANSWER[By using optical genome mapping we can identify several types of structural variants, both known and new, that may contribute to male infertility.] WHAT IS KNOWN ALREADY[Traditional approaches such as karyotyping, CFTR and chromosome Y microdeletion testing are successful in explaining clinical findings in [~]30% of MI patients, leaving the rest without a genetic diagnosis. Recent research suggests at least 265 genes may play a role in male fertility. While the assessment of the roles of copy number variants and single nucleotide variants in monogenic forms of disease in these genes is underway, much less is known about structural variants.] STUDY DESIGN, SIZE, DURATION[We performed a longitudinal case/control study on a total of 220 individuals; 88 patients with male infertility, negative for cytogenetic abnormalities using karyotyping, and molecular testing for chrY microdeletions, and CFTR gene variants, and 132 healthy male individuals that underwent optical genomic mapping for other reasons. Exclusion criteria for the control cohort were low-sperm quality and/or inclusion in IVF procedures. The study was approved by the National Medical Ethics Committee of the Republic of Slovenia (reference number: 0120-213/2022/6). Optical genome mapping was performed from an aliquot of whole blood collected for routine testing purposes at the Clinical Institute of Genomic Medicine (CIGM), UMC Ljubljana from January 2023 to November 2024.] PARTICIPANTS/MATERIALS, SETTING, METHODS[We examined structural variants in 220 participants by using optical genome mapping, which was performed with DLE-1 SP-G2 chemistry and the Saphyr instrument. The de novo assembly and Variant Annotation Pipeline were executed on Bionano Solve3.7_20221013_25 while reporting and direct visualization of structural variants was done on Bionano Access 1.7.2. All obtained variants were filtered using the Bionano Access software and in-house generated gene/regions of interest panel bed files. The first filter was applied to include variants below a population frequency of 10%, and overlapping the regions of interest. Subsequently, all variants occurring with frequency 0% in the internal manufacturer variant dataset were manually evaluated for possible involvement of the overlapping genes or regions in biological processes involved in MI. The male infertility cohort also underwent research whole exome analyses as previously reported. All results of optical genomic mapping were confirmed by an appropriate alternative method where available.] MAIN RESULTS AND THE ROLE OF CHANCE[We show that the overall number of structural variants in MI patients does not differ from that of healthy individuals. By looking in detail at genes and regions associated with MI, we identified 21 rare variants absent from controls in 25.0 % of MI patients, of which five were likely causative, and two would be missed by using traditional approaches. These variants include inversions, duplications, amplifications, deletions (e.g. SPAG1), and insertions/expansions (e.g. DMPK), that were validated using additional methods. While the remaining SV cannot be currently classified as pathogenic according to existing criteria, they open a new avenue in genetic research of MI. LARGE SCALE DATA[Variants reported in this study were deposited into ClinVar under accession numbers SUB15650956 (https://www.ncbi.nlm.nih.gov/clinvar/)] LIMITATIONS, REASONS FOR CAUTION[Technical limitations of optical genome mapping include the lack of DLE-1 labelling of centromeric and telomeric regions, the inability to detect Robertsonian translocations, the unclear exact location of smaller structural variants located between the DLE-1 labels, and unclear boundaries in case of their location in segmentally duplicated regions (this limitation is shared with other methods). The ACGM criteria of rarity are also hard to apply, as the fertility status of the individuals in healthy population databases such as GnomAD and DGV is unknown. Similarly, gene-associated phenotype and the proposed inheritance model both need to be considered as parts of the ACMG criteria, but for many candidate genes associated with MI, no model of inheritance has yet been proposed.] WIDER IMPLICATIONS OF THE FINDINGS[Currently, with the established diagnostic approaches we are able to resolve [~]30% of male infertility cases, with [~]70% of patients remaining undiagnosed. The significance of our work is in showing that rare structural variants can be identified in MI, by using optical genome mapping, opening new avenues of research of the genetics of this important contributor to human fertility.] STUDY FUNDING/COMPETING INTEREST(S)[All authors declare having no conflict of interest in regard to this research. This work was funded by the Slovenian Research and Innovation Agency (ARIS) Programme grant P3-0326: Gynecology and Reproduction: Genomics for personalized medicine] Lay summaryMale infertility affects about 5% of adult males and has complex causes, including genetic ones, such as mutations in the CFTR gene, small deletions on chromosome Y, and balanced translocations, but currently we can only find a genetic cause in [~]30% of patients. This means [~]70% of cases remain undiagnosed but potentially, they too may have a yet unknown genetic cause. Indeed, so far research has shown at least 265 genes have been proposed to play a role in male fertility. In these genes, there has so far been limited research of single nucleotide variants and of copy number variants, but many structural variants are not visible using commonly used methods in clinical genetic testing. Therefore, apart from chromosome Y microdeletions and chromosomal numerical and structural anomalies, such as balanced translocations, the role of smaller structural variants in male infertility is unknown, but based from what we know from other diseases, they also may play a role in male infertility. Optical genome mapping is a novel method for the detection of structural variants, such as balanced and unbalanced translocations, insertions, duplications, deletions, and complex structural rearrangements in a wide range of sizes. By using optical genome mapping to test a cohort of 88 infertile men and 132 healthy controls, we aimed to provide the first insights into the range of SV that may be associated with MI. We found, by using optical genome mapping, the overall number of structural variants in MI patients not to be significantly different to the control group. However, by looking at genes and regions associated with MI, we can find rare structural variants that are absent from controls in 25.0% of MI patients. These variants include inversions, duplications, amplifications, deletions (e.g. deletion in SPAG1), and insertions/expansions (e.g. in DMPK), that were validated using additional methods. Five of these variants (5.6%) were likely causative, and two would be missed by traditional approaches. While the remaining SV cannot be currently classified as pathogenic according to existing criteria, they open a new avenue in genetic research of MI.

4
HFE p.C282Y (rs1800562) allele frequencies in 33 population/control cohorts in Iberia

Barton, J. C.; Barton, J. C.; Acton, R. T.

2025-12-21 genetic and genomic medicine 10.64898/2025.12.19.25342681
Top 0.3%
42× avg
Show abstract

BackgroundHFE p.C282Y (c.845G>A; rs1800562) is a common missense mutation in persons of European ancestry, but we found no comprehensive tabulation of p.C282Y allele frequencies in Iberia. MethodsWe performed computerized and manual searches to identify evaluable reports of p.C282Y alleles in population/control cohorts [≥]50 subjects in Iberia. We tabulated numbers of subjects, nominal geographic sites of cohort recruitment, cohort characteristics, corresponding latitudes and longitudes, and p.C282Y allele frequencies [95% confidence intervals]. We computed the aggregate p.C282Y allele frequencies of mainland Spain and mainland Portugal and compared the aggregate frequencies using the Chi-square test (two-tailed). Using combined mainland Spain and mainland Portugal data, we computed the aggregate p.C282Y allele frequency in Iberia. ResultsWe identified 25 cohorts in mainland Spain (12,297 subjects; 11 of the 15 autonomous communities) and nine cohorts in mainland Portugal (1,024 subjects; each of the five administrative regions). Cohorts were recruited in this region: latitude 43.4619 - 37.2299{degrees} N; longitude -9.1366 - 2.1899{degrees} W. The range of p.C282Y allele frequencies in the 34 cohorts was 0.0000 to 0.0517. The aggregate p.C282Y allele frequency in mainland Spain was 0.0291 (716/24,594) [0.0271, 0.0313] and that in mainland Portugal was 0.0303 (62/2048) [0.0237, 0.0386] (p = 0.8343). The aggregate p.C282Y allele frequency in Iberia was 0.0292 (778/26,642) [0.0272, 0.0313]. ConclusionsWe conclude that the aggregate HFE p.C282Y allele frequencies in mainland Spain and mainland Portugal do not differ significantly. The aggregate p.C282Y allele frequency of 34 population/control cohorts (13,321 subjects, 16 geographic regions) in Iberia is 0.0292.

5
Monogenic Syndromes as a Cause of Adverse Drug Reactions in the Russian Population

Buianova, A. A.; Cheranev, V. V.; Shmitko, A. O.; Vasiliadis, I. A.; Ilyina, G. A.; Suchalko, O. N.; Kuznetsov, M. I.; Belova, V. A.; Korostin, D. O.

2026-02-17 genetic and genomic medicine 10.64898/2026.02.13.26346297
Top 0.3%
38× avg
Show abstract

IntroductionAdverse drug reactions (ADRs) remain a major public health issue, and genetic factors contribute importantly to interindividual variability in drug response. Pharmacogenetic testing helps reduce ADR risk by optimizing drug selection and dosage, particularly in monogenic disorders. Material and MethodsWhole-exome sequencing of 6,739 samples from the Russian population was performed using the MGIEasy Universal DNA Library Prep Set on the DNBSEQ-G400 platform (MGI). Variants in 48 genes were examined, focusing on inherited arrhythmias (Long QT syndrome, Short QT syndrome, Timothy syndrome, Andersen-Tawil syndrome, Brugada syndrome, Atrial fibrillation, Catecholaminergic polymorphic ventricular tachycardia), enzyme deficiencies (Glucose-6-Phosphate Dehydrogenase Deficiency [G6PDD], Porphyrias), Dravet Syndrome (DS) and Malignant Hyperthermia (MH). All identified variants had been reported at least once as pathogenic (P) or likely pathogenic (LP) in ClinVar, along with those occasionally classified as variants of uncertain significance (VUS). Each variant was manually re-evaluated according to ACMG criteria. ResultsA total of 75 unique variants in 18 genes were observed in 119 individuals (1.77%), including 21 carriers and 13 women with a G6PD mutation. Of these, 46 variants were classified as P, 21 as LP, and 8 as VUS. Missense variants accounted for the largest proportion (73.33%). The most affected genes were KCNQ1 (24/119), which exhibited the highest number of unique variants (18), G6PD (20/119), SCN1A (15/119), and RYR1 (14/119). Regarding associated conditions, mutations linked to arrhythmias were found in 51 individuals, MH in 27, G6PDD in 20, DS in 15, and Porphyrias in 6. ConclusionsIncorporating genetic information on both common and rare clinically actionable variants into therapeutic decision-making has the potential to improve medication safety, reduce preventable ADRs, and enhance the effectiveness of personalized pharmacotherapy.

6
Low-level mosaic variants causing the pancreatic disease congenital hyperinsulinism can be detected from blood DNA

Bennett, J. J.; Laver, T. W.; Mannisto, J. M. E.; Houghton, J. A. L.; De Franco, E.; Kalyon, O.; Wright, S.; Johnson, A.-M.; De Leon, D. D.; Globa, E.; Kummer, S.; Banerjee, I.; Dastamani, A.; International Congenital Hyperinsulinism Consortium, ; Wakeling, M. N.; Johnson, M. B.; Flanagan, S. E.

2026-01-15 genetic and genomic medicine 10.64898/2026.01.13.26344002
Top 0.3%
37× avg
Show abstract

A substantial proportion of individuals with a well-defined monogenic disorder remain without a genetic diagnosis. Low-level mosaic pathogenic variants are increasingly recognised as an underappreciated cause of monogenic disease but are technically challenging to detect, particularly in organ-specific conditions when affected tissue is inaccessible. We systematically investigated low-level mosaic variants in individuals with congenital hyperinsulinism (CHI: n=1,252) or neonatal diabetes (NDM: n=312), two opposing pancreatic disorders of insulin secretion. We screened for established pathogenic variants with variant allele fraction (VAF) <8% in dominant CHI (ABCC8, GCK, GLUD1, HK1) or dominant NDM (ABCC8, KCNJ11, INS) genes in targeted next generation sequencing (tNGS) data using Mutect2. This called 40 variants across the four genes in 39 individuals with CHI. No candidate variants were found in the NDM cohort. Orthogonal validation of 35 variants using TaqMan-based droplet digital PCR (ddPCR) confirmed 26/35 variants. The median VAF for confirmed variants was 3.6% (1.1-7.8%), while false positives (9/35) predominantly had a VAF <1% with some overlap in VAF with true positives. This study shows that disease-causing low-level mosaic variants in dominant CHI genes can be detected in blood using tNGS but require orthogonal validation. These results provide a framework to improve diagnostic yield in organ-specific conditions where mosaic variants may represent an important missed cause of disease.

7
Plasma H3.1-Nucleosomes To Classify Severity And Surrogate Response To Treatment In Hidradenitis Suppurativa: A Cohort Study

Theohari, S.; Vlyssidou, A.; Kourtesa, A.; Anastasaki, A.; Kanni, T.; Messiri, P.; Giamarellos-Bourboulis, E.

2026-01-15 dermatology 10.64898/2026.01.13.26343988
Top 0.4%
36× avg
Show abstract

BackgroundHuge neutrophilic infiltrates within lesional and perilesional tissue in hidradenitis suppurativa (HS) give rise to the hypothesis that neutrophil extracellular trap (NET) formation may further drive systemic immune activation in HS. As intrinsic constituents of NETs, nucleosomes-particularly circulating nucleosome containing Histone H3.1 (H3.1-nucleosomes)-serve as reliable indicators of NETosis in the blood. ObjectivesTo investigate whether plasma H3.1-nucleosomes, fluctuate with HS activity. MethodsParticipants were adults with moderate to severe HS. Peripheral blood samples were collected at each visit, EDTA plasma was prepared under standardized conditions and stored at -80{degrees}C. They were subsequently analyzed for circulating H3.1-nucleosomes with a proprietary assay. Baseline and longitudinal data were evaluated in relation to HS severity, clinical characteristics and clinical response using both the HS clinical response score (HiSCR) and the at least 55% decrease of the international HS4 score (IHS4-55). Results93 patients were enrolled; in serial measurements were available for 54. Patients were classified into two clusters; hyper-H3.1 and hypo-H3.1 based on the over-time kinetics of H3.1-nucleosomes. The hyper-H3.1 cluster is characterized by more severe disease. H3.1-nucleosomes 24 ng/ml or more suggest higher total count of inflammatory lesions and of draining tunnels. More than 45% decrease of H3.1-nucleosomes between visits is associated with higher chances for attainment of HiSCR and IHS4-55 responses with biologicals targeting TNF, IL-1 and IL-17. ConclusionsCirculating H3.1-nucleosome levels reflect HS disease activity and surrogate response to treatment. What is already know about this topic?Biomarkers to provide precision approach in hidradenitis suppurativa remain an unmet need. What does this study add?For the first time an easy-to-measure blood test is presented to classify patients and to surrogate treatment. H3.1-nucleosomes distinguish patients into high- and low-level of neutrophil activation. Over-time decreases by 45% or more indicate response to biological treatment. What is the translational message?The new blood test may be used to initiate trials where treatment guidance of both initiation and early stop of treatment will be studied.

8
Morphology-Driven Inference of Patient-Specific Pathophysiological States Enables Precision Treatment in Chronic Spontaneous Urticaria

Seirin-Lee, S.; hiraga, T.; Ishii, H.; Saito, R.; Matsubara, D.; Takahagi, S.; Hide, M.

2026-01-17 dermatology 10.64898/2026.01.15.26344235
Top 0.4%
36× avg
Show abstract

Skin diseases manifest as visually observable eruption patterns, making image-based assessment a central component of dermatological diagnosis. While recent artificial intelligence (AI)-based approaches have achieved remarkable progress in classifying skin diseases from images, their utility remains largely limited to pattern recognition tasks, such as disease identification or severity grading. Crucially, most existing AI frameworks operate as black-box classifiers and do not provide interpretable links between eruption morphology and the underlying in vivo pathophysiological states, thereby offering limited support for personalized treatment decisions. To date, no practical framework has been established to systematically translate eruption morphology into mechanistic insights or treatment-relevant predictions for inflammatory skin diseases such as chronic urticaria. Here, we propose a novel integrative framework that infers patient-specific pathophysiological states directly from skin eruption morphology. Our approach unifies mechanistic mathematical modeling with data science that encompasses machine learning and topological data analysis, together with in vitro experiments and clinical data into a single coherent system. By constructing a mathematical model that explicitly links disease pathophysiology to eruption morphology, we develop a computational parameter inference tool, the System for Skin Eruption Morphology-based Parameter Inference (SEMPi), that estimates patient-specific physiological parameters directly from real-world skin eruption images. Importantly, these inferred parameters are interpretable in terms of underlying biological processes, enabling direct insight into patient-specific disease states rather than mere image-level classification. Furthermore, by incorporating drug interactions into the mathematical model, our framework enables treatment-response prediction and optimization of individualized therapeutic strategies across multiple drugs. This study introduces a paradigm shift from morphology-based classification toward morphology-driven interpretation of patient physiology, providing a foundation for predictive diagnosis and precision treatment in inflammatory skin diseases.

9
Copy Number Analysis in Congenital Nevi: Concordance and Diagnostic Limitations of aCGH, sWGS, and Methylation Profiling

Karelin, A.; Brecht, I. B.; Pogoda, M.; Demidov, G.; Abele, M.; Schneider, D. T.; Aldea, D.; Etchevers, H. C.; Puig, S.; Hahn, M.; Forchhammer, S.

2026-03-03 dermatology 10.64898/2026.03.03.26347388
Top 0.5%
33× avg
Show abstract

BackgroundDistinguishing benign proliferative nodules (PNs) from melanoma arising within congenital melanocytic nevi remains a major diagnostic challenge. Copy number alteration (CNA) analysis is widely used to support classification, but current criteria were developed using array comparative genomic hybridization (aCGH). The performance of alternative platforms such as shallow whole-genome sequencing (sWGS) and methylation arrays in this setting is poorly defined. ObjectivesThe objective of this study is to compare CNA profiles obtained from aCGH, sWGS, and methylation arrays in atypical nodules arising within congenital nevi, and to correlate these molecular findings with clinical outcomes. MethodsSixteen samples from fourteen patients were retrospectively analyzed using all three platforms. CNAs were cataloged, concordance across methods was quantified using the Jaccard index, and molecular classifications were compared. Clinical follow-up was reviewed to provide clinical context. ResultsaCGH detected 39 CNAs, sWGS 60, and methylation profiling 66. Concordance was highest between sWGS and methylation (mean Jaccard 0.67), followed by aCGH versus sWGS (0.64) and aCGH versus methylation (0.49). Cases with high aneuploidy demonstrated strong cross-platform agreement, whereas low-burden lesions exhibited greater variability between methods. Divergent molecular classifications were observed in six cases. ConclusionsWhile all methods reliably detect broad chromosomal changes, sWGS and methylation arrays identify many additional focal CNAs that may not align with CGH-based diagnostic criteria. Until platform-specific thresholds are established, aCGH remains the most conservative and clinically validated approach for evaluating proliferative nodules in congenital nevi. SIGNIFICANCEAccurate molecular classification of melanocytic proliferations in congenital nevi is essential but challenging, particularly in patients with multiple proliferative nodules. This study provides the first systematic comparison of aCGH, sWGS, and methylation-based CNA profiling in this setting. We show that higher-resolution platforms detect substantially more focal aberrations, which can lead to discordant and potentially overcalled malignancy assessments when applying CGH-derived criteria. Our findings highlight the need for platform-adapted diagnostic frameworks and support continued use of CGH as the most conservative and clinically validated method for risk stratification. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=118 HEIGHT=200 SRC="FIGDIR/small/26347388v1_ufig1.gif" ALT="Figure 1"> View larger version (27K): org.highwire.dtl.DTLVardef@1df3551org.highwire.dtl.DTLVardef@1256e50org.highwire.dtl.DTLVardef@6d8660org.highwire.dtl.DTLVardef@911b4f_HPS_FORMAT_FIGEXP M_FIG C_FIG

10
A Global Atlas of Digital Dermatology to Map Innovation and Disparities

Groger, F.; Lionetti, S.; Gottfrois, P.; Gonzalez-Jimenez, A.; Groh, M.; Habermacher, L.; Labelling Consortium, ; Amruthalingam, L.; Pouly, M.; Navarini, A.

2025-12-29 dermatology 10.64898/2025.12.27.25342585
Top 0.6%
31× avg
Show abstract

The adoption of artificial intelligence in dermatology promises democratized access to healthcare, but model reliability depends on the quality and comprehensiveness of the data fueling these models. Despite rapid growth in publicly available dermatology images, the field lacks quantitative key performance indicators to measure whether new datasets expand clinical coverage or merely replicate what is already known. Here we present SkinMap, a multi-modal framework for the first comprehensive audit of the fields entire data basis. We unify the publicly available dermatology datasets into a single, queryable semantic atlas comprising more than 1.1 million images of skin conditions and quantify (i) informational novelty over time, (ii) dataset redundancy, and (iii) representation gaps across demographics and diagnoses. Despite exponential growth in dataset sizes, informational novelty across time has somewhat plateaued: Some clusters, such as common neoplasms on fair skin, are densely populated, while underrepresented skin types and many rare diseases remain unaddressed. We further identify structural gaps in coverage: Darker skin tones (Fitzpatrick V-VI) constitute only 5.8% of images and pediatric patients only 3.0%, while many rare diseases and phenotype combinations remain sparsely represented. SkinMap provides infrastructure to measure blind spots and steer strategic data acquisition toward undercovered regions of clinical space.

11
Self-reported health history from 70,724 individuals reveals novel HLA associations with allergy and other frequently underreported conditions

Boquett, J. A.; Lin, S. Y.-T.; House, J. S.; Ahn, K.; Suseno, R.; BakenRa, A.; Guthrie, K.; Wright, M.; Motsinger-Reif, A.; Maiers, M.; Hollenbach, J. A.

2026-02-19 genetic and genomic medicine 10.64898/2026.02.18.26346586
Top 0.6%
30× avg
Show abstract

BackgroundVariation in the HLA loci, located on human chromosome 6p, has been associated with hundreds of diseases and conditions. However, high levels of polymorphism that characterize the HLA system, coupled with generally modest effect sizes for most phenotypes, necessitate relatively large sample sizes to power association studies; meanwhile, high resolution HLA genotyping remains relatively resource intensive. These constraints limit identification of novel associations. While phenome-wide association studies (PheWAS) in the context of large registries with available electronic health records (EHR) have revealed new insights into the role of HLA in disease, many common health conditions are poorly represented in EHR due to the temporal nature of their occurrence or general underreporting. Further, these studies have generally been conducted with HLA genotyping data imputed from microarrays, rather than direct measurement of high-resolution genotypes. ObjectiveTo overcome these limitations and reveal novel HLA associations we undertook a PheWAS in many previously understudied health conditions. MethodsWe queried over 300 hundred conditions, diseases and traits from 70,724 subjects registered with NMDP with available high-resolution HLA genotyping (HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1). After stratifying according to ancestry, we performed a logistic regression analysis adjusting for sex and age for HLA-phenotype association. ResultsWe identified 48 significant HLA associations across ancestry groups, confirming several known associations and uncovered fifteen novel associations. Most novel associations pertained to common infectious or allergic phenotypes that often go under-reported in the EHR. Of particular translational importance, we identified a previously undetected yet very strong association between HLA-DRB1*04:01 and sensitivity to cefaclor, a specific class of cephalosporin (OR = 3.74, p-value 5.10E-28). Molecular docking simulations predict cefaclor binding in the P4 pocket of HLA-DRB1*04:01, with substantially greater affinity than non-associated antibiotics, including other cephalosporins. This pharmacogenomic signal highlights an opportunity for risk stratification and targeted prevention of adverse drug reactions. Other novel associations found, such as susceptibility to genital warts (HPV) and allergic rhinitis, reveals new insights into the role of specific HLA alleles in immune-mediated disease. The vast majority of these novel associations were replicated in the independent All of Us cohort, confirming the validity of this approach. ConclusionCollectively, our findings demonstrate the value of integrating population-scale, high-resolution HLA genotypes with phenotyping beyond the EHR to reveal immunogenetic influences on common health outcomes. They also point to immediate translational avenues - particularly for drug hypersensitivity - while motivating future functional studies and prospective clinical validation to refine mechanistic understanding and clinical utility.

12
PHARMWATCH: A Multilayer Pharmacogenomics Safety System for Accurate Star Allele Interpretation

Eisenhart, C. E.; Brickey, R.; Mewton, J.

2026-02-28 genetic and genomic medicine 10.64898/2026.02.26.26347200
Top 0.6%
30× avg
Show abstract

The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of benchmark single nucleotide polymorphisms (SNPs) excludes rare or novel pathogenic variants that can invalidate a star allele call and lead to incorrect dosing recommendations. Furthermore, nearby non-pathogenic variants can interfere with haplotype interpretation, introducing additional risk of misclassification. To address these gaps, we developed PHARMWATCH, a multistep pharmacogenomics workflow for comprehensive variant analysis, allele tracking, and contextual interpretation. PHARMWATCH incorporates two algorithmic safeguards designed to identify genomic alterations that compromise star allele accuracy: (1) de novo germline variant screening using the ACMG-based BIAS-2015 classifier and (2) variant interpretation in context (VIIC) to validate the functional integrity of star allele-defining SNPs [3]. Together, these layers enhance the reliability of pharmacogenomic reporting, enabling safe, automated, and review-ready recommendations that extend beyond the constraints of traditional star allele-based approaches.

13
Unifying the communities of early-onset glycogen storage disease type IV and adult polyglucosan body disease through a genetic prevalence study of GBE1-related disease

Koch, R. L.; Akman, H. O.; Chown, E.; Goldman, D.; Levenson, J.; Lu, Q.; Michalovicz Gill, L. T.; Morgan, M.; Orthmann-Murphy, J.; Pires, N. T.; Reef, R.; Saxe, H.; Singer-Berk, M.; Baxter, S.

2025-12-17 genetic and genomic medicine 10.64898/2025.12.16.25342386
Top 0.6%
30× avg
Show abstract

Glycogen storage disease type IV (GSD IV) is an autosomal recessive disorder caused by pathogenic variants in GBE1, resulting in deficient glycogen branching enzyme (GBE) activity and formation of abnormal glycogen ("polyglucosan"). GSD IV manifests across a spectrum of clinical dimensions - including hepatic, neurologic, muscular, and cardiac involvement - which vary in severity. The early-onset forms, historically referred to as Andersen disease, present at different stages ranging from in utero to adolescence. The adult-onset form, referred to as adult polyglucosan body disease (APBD), typically presents in middle to late adulthood. To date, no epidemiological study of GSD IV has been performed. Understanding the global prevalence of GSD IV is critical to increase disease awareness, improve diagnostic rates, inform therapeutic development, and engage pharmaceutical companies. In collaboration with the Rare Genomes Project at the Broad Institute of MIT and Harvard and the APBD Research Foundation, this study curated variants in GBE1 and calculated prevalence across nine genetic ancestry groups. The estimated global carrier frequency of GSD IV is 1 in 243 individuals, and the global genetic prevalence is 1 in 235,784 individuals. Based on the 2024 world population, the estimated number of affected individuals with GSD IV is approximately 34,800. These estimates highlight a significant underdiagnosis of GSD IV and underscore the urgent need for increased awareness of this metabolic disorder. This model of collaboration between researchers, patient advocacy organizations, and genetic data sharing programs provides a framework for estimating the prevalence of other rare diseases in the global population. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=180 HEIGHT=200 SRC="FIGDIR/small/25342386v1_ufig1.gif" ALT="Figure 1"> View larger version (49K): org.highwire.dtl.DTLVardef@1a1ad7dorg.highwire.dtl.DTLVardef@1851576org.highwire.dtl.DTLVardef@442c19org.highwire.dtl.DTLVardef@1ab2ddb_HPS_FORMAT_FIGEXP M_FIG Created in BioRender. Koch, R. (2025) https://BioRender.com/j0sg30n. C_FIG

14
Phenotype-first patient matching with SimPheny identifies diagnostic candidates beyond curated gene associations

Cooperstein, I. B.; Ward, A.; Kobren, S. N.; Lebleu, E.; Moore, B.; Spillmann, R. C.; Shashi, V.; Undiagnosed Diseases Network, ; Marth, G. T.

2026-01-17 genetic and genomic medicine 10.64898/2026.01.15.26344236
Top 0.6%
29× avg
Show abstract

Diagnostic tools for rare diseases typically rely on curated gene-phenotype associations and static disease models, limiting their effectiveness in cases with atypical presentations or previously uncharacterized disorders. To address these limitations, we present SimPheny, a phenotype-first algorithm for gene prioritization that operates independently of documented gene-phenotype associations. SimPheny identifies phenotypically similar diagnosed patients by comparing an undiagnosed patients disease presentation to a reference cohort of diagnosed cases, and returns gene hypotheses by matching the undiagnosed patients candidate gene list to the causative genes of similar patients using a statistical scoring model. Evaluated in diagnosed probands from the Undiagnosed Diseases Network (UDN) with the true diagnostic gene blinded, SimPheny consistently ranked the diagnostic gene among the top five candidates, outperforming existing tools, particularly for genes with limited gene-phenotype association data. When applied to previously unsolved UDN cases, clinical review confirmed that SimPhenys high-confidence causative gene predictions were diagnostic in nearly half of the analyzed cases. As the size of the diagnosed reference cohort increases, SimPhenys diagnostic reach expands without sacrificing ranking performance. By leveraging real patient data rather than curated models, SimPheny provides a generalizable, scalable framework for improving diagnostic yield in rare disease cohorts.

15
HSP90AA1 variants may contribute to autosomal dominant human male infertility

Wyrwoll, M. J. J.; MacLeod, D. M.; Salvarci, A.; Kliesch, S.; Okutman, O.; Viville, S.; Stallmeyer, B.; Tüttelmann, F.; O'Carroll, D.

2026-01-12 genetic and genomic medicine 10.64898/2026.01.08.26343516
Top 0.6%
29× avg
Show abstract

Study questionDo variants in HSP90AA1 cause human male infertility? Summary answerVariants in HSP90AA1 appear as a possible autosomal dominant cause of human male infertility. What is known alreadyMale infertility is a highly heterogeneous condition, with so far over 300 genes described in this context. HSP90AA1 appears as a promising candidate gene for human male infertility, because the gene is highly conserved between species and knock-out of Hsp90aa1 in mice results in male-specific infertility due to azoospermia without further health implications. Study design, size, durationWe screened >2,300 infertile men for possibly pathogenic variants in HSP90AA1 and created a mouse line harbouring the homozygous missense variant c.605G>A p.(Arg202Lys). Participants/materials, setting, methodsPhenotypes of men with identified variants were determined based on semen analysis and testicular histology. Pathogenicity of detected variants was assessed using AlphaMissense and a mouse model. Male fertility of the mutant mouse line was analysed via plug-matings, histology and immunofluorescence staining (IF). Expression of HSP90AA1 in testicular tissue was assessed by IF. Main results and the role of chanceThe mode of inheritance (MOI) in mice is autosomal recessive but the constraint metrics (oe-score = 0.2, pLI = 1) and in silico prediction suggest that HSP90AA1 is an autosomal dominant gene in humans. We therefore screened for both, heterozygous and biallelic variants in exome sequencing data of infertile men. While we did not detect any biallelic loss-of-function variants, we identified the homozygous missense variant c.605G>A p.(Arg202Lys) in an azoospermic man as a promising variant. This variant is extremely rare and affects a highly conserved amino acid. However, male homozygous mice with this variant are fertile with no differences in litter size and testicular size or histology, making it unlikely that this variant is the cause of the mans azoospermia. We therefore focused on heterozygous possibly pathogenic variants in HSP90AA1 and found a heterozygous frameshift variant in an azoospermic man with hypospermatogenesis as well as four heterozygous missense variants, predicted to affect protein function in azoo- or cryptozoospermic men. Large scale dataN/A Limitations, reasons for cautionOur findings suggest a dominant MOI in humans but currently cannot fully prove this. To further clarify the MOI and ultimately improve clinical validity of HSP90AA1 replication of our findings in independent cohorts of infertile men as well as segregation analyses are required. Wider implications of the findingsWhile most human male infertility genes follow an autosomal recessive MOI, HSP90AA1 might be one of the few autosomal dominant infertility genes in humans. Differences in the MOI between humans and mice are also known from well-established infertility genes such as DMRT1. Study funding/competing interest(s)This work was supported by a German Research Foundation (DFG) fellowship (award WY 215/1-1 to MJW), the DFG-sponsored Clinical Research Unit Male Germ Cells (CRU326, project 329621271 to FT), and Wellcome Trust funding (225237 to DOC). This work was supported by funding for the Wellcome Discovery Research Platform for Hidden Cell Biology (226791).

16
Impact of proteogenomic evidence on clinical success

Karim, M. A.; Hukku, A.; Ariano, B.; Holzinger, E.; Tsepilov, Y.; Hayhurst, J.; Buniello, A.; McDonagh, E. M.; Castel, S. E.; Nelson, M. R.; Maranville, J.; Yerges-Armstrong, L.; Ghoussaini, M.

2026-02-25 genetic and genomic medicine 10.64898/2026.02.23.26346731
Top 0.7%
29× avg
Show abstract

We assessed the impact of plasma protein quantitative trait loci (pQTL) on therapeutic hypotheses backed by human genetic evidence. We show that pQTL-supported target-indication pairs were 4.7 times more likely to advance from Phase I to launch, compared to a 2.6-fold increase observed only with human genetic evidence. Moreover, pQTL-based enrichment was prominent in druggable protein families which had limited enrichment from human genetic evidence alone.

17
Deep Agentic Variant Prioritisation for Expert Level Genetic Diagnosis Fast at Scale

Kara, M.; Gungor, A. F.; Kuday, S. E.; Ozcelik, O.; Ozden, F.

2026-02-18 genetic and genomic medicine 10.64898/2026.02.17.26346421
Top 0.7%
29× avg
Show abstract

Genetic diagnosis remains a formidable challenge characterized by a diagnostic odyssey that spans years, with over half of rare disease patients remaining undiagnosed affecting more than 300 million people on earth. Clinicians must navigate through thousands of candidate variants against a noisy and fragmented literature landscape, a task that overwhelms human cognitive capacity and conventional decision-making approaches. Recent advances in agentic artificial intelligence systems have demonstrated superior performance in complex, multi-step reasoning tasks by systematically evaluating vast amounts of information, breaking down problems into manageable components, and adapting dynamically to new evidence. These capabilities align precisely with the requirements of genetic variant prioritization. Here we present DAVP (Deep Agentic Variant Prioritisation), a hierarchical agentic AI system that represents a major step forward in genetic diagnosis through patient-specific variant evaluation. Unlike traditional approaches that apply generic pathogenicity scores, DAVP evaluates each variant within the full context of the patients clinical presentation, phenotypic profile, and genomic background. The system comprises three interconnected algorithmic components: prelimin8, a gene pre-screening algorithm that rapidly filters the genomic search space; inGeneTopMatch, a semantic knowledge graph algorithm that captures complex gene-phenotype-disease relationships; and elimin8, an in-context learning prioritization algorithm that dynamically ranks variants through iterative knowledge sorting and evidence synthesis. We conducted comprehensive benchmarks measuring diagnostic cumulative distribution function (CDF) recall based on top-k variant recommendations using simulation cases constructed with 1000 Genomes as healthy background genomes and variants from ClinVar as positive controls. DAVP demonstrates strong diagnostic performance superior to expert genetic clinicians while operating at orders of magnitude greater speed and scale. Our results demonstrate that agentic AI systems can transform rare disease diagnostics by combining the systematic evaluation capabilities of artificial intelligence with the nuanced clinical reasoning required for complex genetic diagnosis. This work lays the foundation for a new paradigm in AI-driven genetic medicine that could accelerate diagnosis, reduce healthcare costs, and improve patient outcomes worldwide. The source code and data to reproduce this work are available at https://github.com/Muti-Kara/davp.

18
Palette polygenic risk score framework improves risk prediction by capturing clinical heterogeneity of type 2 diabetes

Miyake, A.; Tanabe, H.; Narita, A.; Ojima, T.; Kyosaka, T.; Gocho, C.; Sakurai, R.; Takayama, J.; Yamakage, H.; Tanaka, K.; Kazama, J. J.; Satoh-Asahara, N.; Shimabukuro, M.; Tamiya, G.

2026-01-13 genetic and genomic medicine 10.64898/2026.01.12.25342123
Top 0.7%
29× avg
Show abstract

Polygenic risk scores (PRSs) are typically constructed under the assumption of a single, homogeneous disease phenotype. However, many common diseases exhibit considerable clinical heterogeneity and encompass multiple subtypes with distinct etiologies and clinical characteristics. As a result, conventional PRSs often overlook differences in underlying biological pathways among disease subtypes, consequently limiting predictive accuracy and cross-ancestry transferability. To address this challenge, we propose the "palette PRS," a framework that integrates a set of partitioned polygenic scores (pPSs) for biologically interpretable pathways with subtype-specific weights. This approach can flexibly capture the relative contributions of multiple pathways within each individual and provides a unified risk score. We applied this framework to type 2 diabetes (T2D), a clinically highly heterogeneous disease. For T2D, previous machine learning-based studies have identified four distinct subtypes and 12 biologically interpretable pathways derived from 650 genome-wide significant variants. Building on these established findings, we employed an elastic net model incorporating subtype membership probabilities to derive subtype-optimized palette PRS through the weighted integration of the pPSs of these 12 pathways. Our palette PRS showed superior predictive performance, with particularly high accuracy for the severe insulin-deficient diabetes (SIDD) subtype (AUC=0.744), compared with both conventional T2D PRS (AUC = 0.661) or subtype-stratified GWAS-based PRS (AUC = 0.547). Moreover, our palette PRS exhibited substantial cross-ancestry transferability between East Asian and European populations. This strategy represents a major step toward clinically actionable, subtype-optimized risk prediction and personalized prevention in T2D worldwide.

19
Protein-based genomic analysis for the identification of risk loci associated with acute respiratory distress syndrome

Suarez-Pajes, E.; Rubio-Rodriguez, L. A.; Tosco-Herrera, E.; Ramirez-Falcon, M.; Gonzalez-Barbuzano, S.; Jasper, D.; Munoz-Barrera, A.; Hernandez-Beeftink, T.; Corrales, A.; Espinosa, E.; Dominguez, D.; Gonzalez-Montelongo, R.; Lorenzo-Salazar, J. M.; Garcia-Laorden, M. I.; Villar, J.; GEN-SEP study, ; Guillen-Guio, B.; Flores, C. N. S.

2026-01-16 genetic and genomic medicine 10.64898/2026.01.14.26344107
Top 0.7%
28× avg
Show abstract

BackgroundAcute respiratory distress syndrome (ARDS) is a life-threatening lung condition that requires admission to an intensive care unit (ICU). Sepsis is one of the leading causes of ARDS and understanding protein regulation during sepsis could reveal key mechanisms that predispose patients to ARDS. We performed genome-wide association studies (GWAS) on ARDS biomarkers levels to identify protein quantitative trait loci (pQTLs) and genes which could be associated with ARDS risk. MethodsGWAS were performed in 209 patients with sepsis from the GEN-SEP cohort to determine the association of imputed genotypes with 10 serum biomarker levels relevant to ARDS. Measurements were obtained by ELISA within the first 24 hours (T1), 48-72 hours (T2), and 7 days (T7) after the diagnosis of sepsis. We conducted a multi-trait analysis to aggregate the GWAS results for each biomarker at three time points. We prioritized genes in the significant loci (p<5x10-8) and evaluated the association between rare variants and ARDS in whole-exome sequencing data from 272 patients with sepsis-associated ARDS and 550 sepsis controls from GEN-SEP. We analyzed the aggregated association of pQTLs with ICU mortality, multiple organ failure, and ARDS risk using polygenic scores (PGS) in independent patients (n=621) from GEN-SEP. ResultsWe identified 27 significant independent loci and prioritized 56 genes. Seven of these were previously associated with respiratory infections and diseases (LINGO2, MC4R, MCTP1, NUAK1, PIEZO2, PTPRD, and TMEMc5). Defects in another prioritized gene, FOXN1, cause an inborn error of immunity. Rare variants in PTPRD, which was previously involved in COVID-19 severity and pulmonary hypertension, were significantly associated with ARDS (p=3.11x10-4). PGS of PAI-1 levels was significantly associated with ICU mortality. ConclusionsWe prioritized genes of interest governing ARDS biomarker levels and identified PTPRD as a novel gene associated with ARDS risk. In addition, we demonstrate the value of biomarker PGS for predicting sepsis mortality. O_FIG O_LINKSMALLFIG WIDTH=199 HEIGHT=200 SRC="FIGDIR/small/26344107v1_ufig1.gif" ALT="Figure 1"> View larger version (57K): org.highwire.dtl.DTLVardef@1654886org.highwire.dtl.DTLVardef@7c8865org.highwire.dtl.DTLVardef@1dea651org.highwire.dtl.DTLVardef@791901_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical AbstractC_FLOATNO C_FIG

20
Constructing a Literature-Derived Database for Benchmarking Polygenic Risk Score Construction Methods with Spectral Ranking Inferences

Sebastian, C.; Yu, M.; Jin, J.

2026-03-03 genetic and genomic medicine 10.64898/2026.03.01.26347258
Top 0.7%
28× avg
Show abstract

Polygenic risk scores (PRSs) have emerged as a valuable tool for genetic risk prediction and stratification in human diseases. Over the past decade, extensive methodological efforts have focused on improving the predictive power of PRS, leading to the development of numerous methods for PRS construction. Benchmarking these various methods thus becomes an essential task that is crucial for guiding future PRS applications. While studies have benchmarked subsets of these methods on specific phenotypes and cohorts, the resulting evidence remains fragmented, with a lack of work that comprehensively assess the relative performance of the various PRS methods. In this study, we addressed this gap by systematically constructing a PRS method benchmarking database synthesizing published results from 2009 to 2025. We applied a spectral ranking inference framework with uncertainty quantification to rank 14 PRS methods that had been adequately compared against each other in the literature. We constructed rankings using two complementary sources: original method-development studies and applications/benchmarking studies. While the highest-ranked methods (LDpred2 and AnnoPred) and the lowest-ranked method (C+T) were consistently identified from both sources, the relative ordering of most methods showed moderate variability. We further constructed phenotype-specific rankings, providing more detailed insights into the robustness and phenotype-specific strengths of individual methods. Collectively, the overall and phenotype-specific rankings of the PRS methods, along with the curated benchmarking data from the literature, provide a dynamic and practical reference database that can continuingly be updated with emerging new PRS methods and published benchmarking results to guide future PRS applications.